# MSTICPy - Microsoft Threat Intelligence Center Jupyter & Python Security Tools

msticpy is a library for InfoSec investigation and hunting in Jupyter Notebooks. It includes functionality to:
- query log data from multiple sources
- enrich the data with Threat Intelligence, geolocations and Azure resource data
- extract Indicators of Activity (IoA) from logs and unpack encoded data
- perform sophisticated analysis such as anomalous session detection and time series decomposition
- visualize data using interactive timelines, process trees and multi-dimensional Morph Charts

It also includes some time-saving notebook tools such as widgets to set query time boundaries, select and display items from lists, and configure the notebook environment.

Source Code: https://github.com/microsoft/msticpy
Python Package: https://pypi.org/project/msticpy/#:~:text=Microsoft%20Threat%20Intelligence%20Python%20Security%20Tools.%20msticpy%20is,functionality%20to%3A%20query%20log%20data%20from%20multiple%20sources
Docs: https://msticpy.readthedocs.io/en/latest/

## Why use MSTICPy?

Libraries such as MSTICPy include a wide range of functionality that you might want to use in a notebook, and make them avaliable in a easy to access way. This saves you significant time in writing code, identifying how specific APIs work, and coverting data so that it works between functions/services. 
Whilst there are other libraries that can do *some* of what MSTICPy does, MSTICPy provides all of these features in one place, with a integrated datamodel and configuration.

<div style="color: Black; background-color: Red; border: solid; padding: 5pt;"><b>
Note:</b> This notebook has deliberate errors in it for the purpose of teaching how to troubleshoot them. Executing the notebook as is will fail.
</div>

## Installing and importing pacakges in Python

To use any library in Python you first need to install the pacakge and import it.
There are several ways to do this depending on how you want to access the library, however the simplest and easiest is using pip. [Pip](https://pypi.org/project/pip/) is the pacakge installer for Python and makes finding and installing Python pacakges simple.
You can use pip to install packages via the command line, or if you are using a notebook, directly in a notebook cell. Azure ML compute come with Pip installed already but if you are running your notebook elsewhere you may need to install pip first.

To do this we need to use `%pip` followed by install and the pacakge name. e.g.:
`%pip install requests`

<div style="color: Black; background-color: Khaki; border: solid; padding: 5pt;"><b>
Note:</b> `%pip` is whats called a magic function in Jupyter. This tells the notebook to use pip to install the package in the notebooks compute environment.
</div>

In [None]:
%pip install requests

In [None]:
%pip install requests==2.2

If you have a package installed but you want to update it to the latest version you can add the `--upgrade` parameter

In [None]:
%pip install requests --upgrade

<div style="color: Black; background-color: Khaki; border: solid; padding: 5pt;"><b>
Note:</b> Once you have installed a pacakge its a good idea to restart the kernel, this will ensure that when you import the package you will be using the latest version.
</div>

<div style="color: Black; background-color: skyblue; border: solid; padding: 5pt;"><b>
Note:</b> During installation of pacakges you may see some warnings related to pacakge dependency, this is due to the fact that some packages have requirements on other pacakges being installed and something these requirements can clash (i.e. package 1 requires package A version 1.1 but pacakge 2 also requires package A but version 1.2). Often these warnings do not cause significant issues so attempt to run the notebook and see if it can execute correctly.
</div>

![Example error message](1.png)



### Importing

Once a package has been installed you need to import some or all of it.

This is done with the `import` statement.

Generally there are 2 ways to import things in Python:
- `import <package>` - this imports everything in the pacakge
- `from <package> import <item>` - this imports a specific item from the package

You can also import pacakages and rename them for ease when calling them later:
`import <pacakage> as <alias>`
e.g. `import pandas as pd`

In [None]:
import pandas as pd

In [None]:
import xyz

In [None]:
%pip list

Some packages do not use the same name for installation and import. You many need to check package documentation to ensure you are improting correctly.

In [None]:
%pip install scikit-learn

In [None]:
import sklearn

## Installing and Importing MSTICPy
Now that we have seen the fundamentals of installing and importing lets install and import MSTICPy:

In [None]:
# Install the latest version of MSTICPy
%pip install msticpy --upgrade

Don't forget to restart that kernel!

No we could import MSTICPy as a whole with `import msticpy` however its a big pacakge with a lot of features, so to make it easier we have a function called `nbinit` that conducts a number of checks to make sure the environment is good, handles key imports and set up for us.

In [None]:
from msticpy.nbtools import nbinit
nbinit.init_notebook(
    namespace=globals()
)

<div style="color: Black; background-color: Green; color: white; padding: 5pt;"><b>
Great! we are now ready to get going.</b>
</div>


## MSTICPy's config file

MSTICPy can handle connections to a variety of data sources and services, including Azure Sentinel.

To make it easier to manage and re-use the configuration and credentials fo these things MSTICPy has its own config file that holds these items - `msticpyconfig.yaml`

When you launched this notebook from Azure Sentinel it copied a basic configuration file - `config.json` -
to your workspace folder.<br>
You should be able to see this file in the file browser to the left.<br>
This file contains details about your Azure Sentinel workspace but has
no configuration settings for other external services that we need.

If you didn't have a `msticpyconfig.yaml` file in your workspace folder (which is likely
if this is your first use of notebooks), the `init_notebook` function should have created
one for you and populated it
with the Azure Sentinel workspace data taken from your config.json.

<p style="border: solid; padding: 5pt; color: white; background-color: DarkOliveGreen"><b>Tip:</b>
If you do not see a "msticpyconfig.yaml" file in your user folder, click the refresh button<br>
at the top of the file browser.
</p>

We can check this now by opening the settings editor and view the settings.

<div style="color: Black; background-color: Khaki; border: solid; padding: 5pt;"><b>
You should not have to change anything here unless you need to add
one or more additional workspaces.</b></div>
<p/>

When you have verified that this looks OK. Click **Save Settings**



In [None]:
from msticpy.config import MpConfigEdit
import os

mp_conf = "msticpyconfig.yaml"

# check if MSTICPYCONFIG is already an env variable
mp_env = os.environ.get("MSTICPYCONFIG")
mp_conf = mp_env if mp_env and Path(mp_env).is_file() else mp_conf

if not Path(mp_conf).is_file():
    print(
        "No msticpyconfig.yaml was found!",
        "Please check that there is a config.json file in your workspace folder.",
        "If this is not there, go back to the Azure Sentinel portal and launch",
        "this notebook from there.",
        sep="\n"
    )
else:
    mpedit = MpConfigEdit(mp_conf)
    mpedit.set_tab("AzureSentinel")
    display(mpedit)

We are going to use [VirusTotal](https://www.virustotal.com) (VT) as an example of a popular threat intelligence source.
To use VirusTotal threat intel lookups you will need a VirusTotal account and API key.

You can sign up for a free account at the
[VirusTotal getting started page](https://developers.virustotal.com/v3.0/reference#getting-started) website.

If you are already a VirusTotal user, you can, of course, use your existing key.

<p style="border: solid; padding: 5pt; color: black; background-color: Khaki">
<b>Warning</b> If you are using a VT enterprise key we do not recommend storing this
in the msticpyconfig.yaml file.<br>
MSTICPy supports storage of secrets in
Azure Key Vault. You can read more about this
<a href=https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html#specifying-secrets-as-key-vault-secrets >in the MSTICPY docs</a><br>
For the moment, you can sign up for a free acount, until you can take the time to
set up Key Vault storage.
</p>


As well as VirusTotal, we also support a range
of other threat intelligence providers: https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html
<br><br>

To add the VirusTotal details, run the following cell.

1. Select "VirusTotal" from the **Add prov** drop down
2. Click the **Add** button
3. In the left-side Details panel select **Text** as the Storage option.
4. Paste the API key in the **Value** text box.
5. Click the **Update** button to confirm your changes.

Your changes are not yet saved to your configuration file. To
do this, click on the **Save Settings** button at the bottom of the dialog.

If you are unclear about what anything in the configuration editor means, use the **Help** drop-down. This
has instructions and links to more detailed documentation.

In [None]:
mpedit.set_tab("TI Providers")
mpedit

Our notebooks commonly use IP geo-location information. 
In order to enable this we are going to set up [MaxMind GeoLite2](https://www.maxmind.com)
to provide geolocation lookup services for IP addresses.

GeoLite2 uses a downloaded database which requires an account key to download.
You can sign up for a free account and a license key at 
[The Maxmind signup page - https://www.maxmind.com/en/geolite2/signup](https://www.maxmind.com/en/geolite2/signup).
<br>

<details>
    <summary>Using IPStack as an alernative to GeoLite2...</summary>
    <p>
    For more details see the
    <a href=https://msticpy.readthedocs.io/en/latest/data_acquisition/GeoIPLookups.html >
    MSTICPy GeoIP Providers documentation</a>
    </p>
</details>
<br>

Once, you have an account, run the following cell to add the Maxmind GeopIP Lite details to your configuration.

The procedure is similar to the one we used for VirusTotal:

1. Select the "GeoIPLite" provider from the **Add prov** drop-down
2. Click **Add**
3. Select **Text** Storage and paste the license (API/Auth) key into the text box
4. Click **Update**
5. Click **Save Settings** to write your settings to your configuration.


In [None]:
mpedit.set_tab("GeoIP Providers")
mpedit

## Validate your settings

- click on the **Validate settings** button.

You may see some warnings about missing sections but not about the Azure Sentinel, TIProviders or GeoIP Providers settings.

Click on the **Close** button to hide the validation output.

If you need to make any changes as a result of the Validation,
remember to save your changes by clicking the **Save File** button.

In [None]:
msticpy.settings.refresh_config()

## Getting Data From Azure Sentinel

Now that the setup is out the way we want to focus on 

In [None]:
!az login

Querying data from Azure Sentinel is handled by MSTICPy's `QueryProvider`. The first step is to initalize a QueryProvider and tell it we want to use the Azure Sentinel Query provider.

The other thing we want to provide the QueryProvider with is some details of the workspace we want to connect to. We *could* do this manually, but its much easier to get details from the configuration we set up earlier. We can do this with `WorkspaceConfig`

In [None]:
from msticpy.nbtools import nbinit
nbinit.init_notebook(namespace=globals())

qry_prov=QueryProvider("AzureSentinel")
ws_config = WorkspaceConfig(workspace="CyberSecDemo")

What `WorkspaceConfig` is doing for is is creating the connection string used by the `QueryProvider`:

In [None]:
ws_config.code_connect_str

Once set up we can tell the `QueryProvider` to `connect` which will kick off the authentication process. There are a number of ways that we can handle that authentication.

In [None]:
#qry_prov.connect(ws_config)
qry_prov.connect(ws_config, mp_az_auth="cli")

Now that we are connected to Azure Sentinel we can start to look at running some queries to get some data.

MSTICPy comes with a number of built in Azure Sentinel queries to get some common datasets into the Notebook. 

You can see a list of the avaliable queries with: `.list_queries`

In [None]:
qry_prov.list_queries()

However this output only has some use. To make these in-built queries more accesible and findable there is a query browser which makes searching for, and learning about, these queries much easier.

In [None]:
qry_prov.browse_queries()

Now that we have found a query that we want to run we simply pass its name to the `QueryProvider` and that in turn returns to results of the query in a Pandas DataFrame.

In addition to the stock query we can customize certain elements of the query.

In [None]:
#qry_prov.Azure.list_all_signins_geo()
#qry_prov.SecurityAlert.list_alerts('?')
qry_prov.SecurityAlert.list_alerts(add_query_items="| take 10")
#qry_prov.SecurityAlert.list_alerts(add_query_items="take 10")

We also don't need to use the built-in queries. We can write our own queries and have then executed using `.exec_query` 

In [None]:
query = "SecurityAlert | take 10"
#qry_prov.exec_query(query)
alert_df = qry_prov.exec_query(query)

In [None]:
alert_df

## Working with the data

Data returned by the `QueryProvider` comes back in a Pandas DataFrame. This provides us with a powerful and flexible way to access our data.

One of the core things we want to do is look at specific rows in our table. Each table has an index that can be used to call a row using `.loc`, alternatively we can return a row by its position in the table with `.iloc`

In [None]:
alert_df.loc[1]

We can also choose just to return specific columns by providing a list of them to the DataFrame:

In [None]:
alert_df.iloc[:5][["AlertName", "AlertSeverity", "Description"]]

We can also do things such as search for rows with specific data.

In [None]:
alert_df[alert_df["AlertName"].str.contains("credential theft")]

Pandas also has some features to allow you to visualize the data you have:

In [None]:
alert_df["AlertSeverity"].value_counts().plot(kind='pie')

In [None]:
alert_df["AlertSeverity"].value_counts().plot(kind='bar')

There are many, many more features in Pandas. When starting with MSTICPy its a good idea to spend some time learning about the power of Pandas - https://pandas.pydata.org/docs/

## Enriching data using external data sources

One of the powerful elements of Notebooks is you can combine data from Azure Sentinel with data from other sources. One of the most common sources of this data in security is Threat Intelligence (TI) data. MSTICPy has a support for a number of Threat Intelligence data sources including:
- VirtusTotal
- GreyNoise
- AlienVault OTX
- IBM XForce
- Azure Sentinel TI data
- OPR (for PageRank details)
- ToR ExitNode information.

In [None]:
query = "SigninLogs | sample 100"
signin_df = qry_prov.exec_query(query)
signin_df.head()

The first step in using these TI sources is to create a `TILookup` object. This is can then be used to perform lookups.

Lookups can be done against individual items via `.lookup_ioc` or against multiple items with `.lookup_iocs`.

In [None]:
ti = TILookup()
ti.lookup_iocs(signin_df, obs_col="IPAddress", providers=["GreyNoise"])

In [None]:
ti_hits = ti.lookup_iocs(signin_df, obs_col="IPAddress",providers=["GreyNoise"])
ti_hits[ti_hits["Result"]==True]

In [None]:
signin_df.set_index('IPAddress').join(ti_hits[ti_hits["Result"]==True].set_index('Ioc'), rsuffix="_", how="inner")[["TimeGenerated", "UserPrincipalName"]]

In [None]:
vt_df = ti.lookup_iocs(signin_df["IPAddress"].unique()[:4], providers=["VirusTotal"])
vt_df

In [None]:
ti.browse_results(vt_df)

In [None]:
ti.browse_results(ti.result_to_df(ti.lookup_ioc("87.97.178.92")))

## Azure API access

MSTICPy also has integration with a range of Azure APIs that can be used to retrieve additional informaiton or perform actions.

In [None]:
from msticpy.data.azure_sentinel import AzureSentinel

azs = AzureSentinel()
azs.connect()

In [None]:
subs = azs.get_subscriptions()
subs.head()

In [None]:
azs.get_subscription_info(subs.iloc[0]["Subscription ID"])

In [None]:
azs.get_incident(incident_id = "7a4f5e0e-c202-4298-8cb6-e1278500fbc7", sub_id = "d1d8779d-38d7-4f06-91db-9cbc8de0176f", res_grp= "soc", ws_name="cybersecuritysoc")

## Visualizations with MSTICPy  

The ability to create complex, interactive visualizations is one of the key benefits of Notebooks. Creating these visulizations from scratch can be quite complex and involve a lot of code. 

To make the process easier MSTICPy contains a number of common visualization that can quickly and easily be called with minimal code.

### Timelines

Understanding when events occured and in what order is key component of many security investigations. MSTICPy has the ability to plot various types of timelines.

In [None]:
user_df = qry_prov.Azure.list_aad_signins_for_account(account_name="pdemo@seccxpninja.onmicrosoft.com")
#timeline.display_timeline(user_df)
timeline.display_timeline(user_df, source_columns=["UserPrincipalName", "ResultType"])

In [None]:
user_df.columns

In [None]:
timeline.display_timeline(user_df, source_columns=["UserPrincipalName", "ResultDescription"])   


In [None]:
ref_time = user_df["TimeGenerated"].iloc[5]
timeline.display_timeline(user_df, source_columns=["UserPrincipalName", "ResultDescription"], group_by="ResultType", ref_time=ref_time)

In [None]:
alert_df = qry_prov.SecurityAlert.list_alerts(add_query_items="| take 10")
alert_df

In [None]:
timeline_duration.display_timeline_duration(alert_df, group_by="AlertName", time_column="StartTimeUtc", end_time_column="EndTimeUtc")

In [None]:
#alert_df.mp_plot.timeline()
alert_df.mp_plot.timeline(group_by="Severity", source_columns=["AlertName", "TimeGenerated"])

MSTICPY also includes a number of interactive widgets that make it easier for users to interact with notebooks.

In [None]:
network_vendor_data_q = "CommonSecurityLog | summarize by DeviceVendor"
network_vendor_data = qry_prov.exec_query(network_vendor_data_q)
network_selector = nbwidgets.SelectItem(
    item_list=network_vendor_data["DeviceVendor"].to_list(),
    description='Select an vendor',
    action=print,
    auto_display=True
);


In [None]:
network_data_q = f"""CommonSecurityLog 
    | where DeviceVendor == '{network_selector.value}'
    | take 50"""
network_data = qry_prov.exec_query(network_data_q)
network_data.head()

The Matrix Plot graph in MSTICPy allows you to plot the interactions between two elements in your data.

In [None]:
network_data.mp_plot.matrix(x="SourceIP", y="DestinationIP", title="IP Interaction")

In [None]:
q_times = nbwidgets.QueryTime(units='day', max_before=20, before=5, max_after=1)
q_times.display()

In [None]:
print(123)

In [None]:
security_alerts = qry_prov.SecurityAlert.list_alerts(add_query_items="| take 10")
alert_select = nbwidgets.SelectAlert(alerts=security_alerts, action=nbdisplay.display_alert)
display(Markdown('### Alert selector with action=DisplayAlert'))
display(HTML("<b> Alert selector with action=DisplayAlert </b>"))
alert_select.display()

## What to do next:

Run the Getting Started Notebook in Azure Sentinel
    - This will help you get your config set up

	
Try the MSTICPy Lab – https://aka.ms/msticpy-demo 

Go and read the docs – https://msticpy.readthedocs.io/en/latest/GettingStarted.html 

Learn more about Pandas - https://pandas.pydata.org/docs/ 

Check out our other notebooks for ideas! - https://github.com/Azure/Azure-Sentinel-Notebooks 


